74 research outputs found
Identifying Patch Correctness in Test-Based Program Repair
Test-based automatic program repair has attracted a lot of attention in
recent years. However, the test suites in practice are often too weak to
guarantee correctness and existing approaches often generate a large number of
incorrect patches.
To reduce the number of incorrect patches generated, we propose a novel
approach that heuristically determines the correctness of the generated
patches. The core idea is to exploit the behavior similarity of test case
executions. The passing tests on original and patched programs are likely to
behave similarly while the failing tests on original and patched programs are
likely to behave differently. Also, if two tests exhibit similar runtime
behavior, the two tests are likely to have the same test results. Based on
these observations, we generate new test inputs to enhance the test suites and
use their behavior similarity to determine patch correctness.
Our approach is evaluated on a dataset consisting of 139 patches generated
from existing program repair systems including jGenProg, Nopol, jKali, ACS and
HDRepair. Our approach successfully prevented 56.3\% of the incorrect patches
to be generated, without blocking any correct patches.Comment: ICSE 201
Faster Mutation Analysis via Equivalence Modulo States
Mutation analysis has many applications, such as asserting the quality of
test suites and localizing faults. One important bottleneck of mutation
analysis is scalability. The latest work explores the possibility of reducing
the redundant execution via split-stream execution. However, split-stream
execution is only able to remove redundant execution before the first mutated
statement.
In this paper we try to also reduce some of the redundant execution after the
execution of the first mutated statement. We observe that, although many
mutated statements are not equivalent, the execution result of those mutated
statements may still be equivalent to the result of the original statement. In
other words, the statements are equivalent modulo the current state.
In this paper we propose a fast mutation analysis approach, AccMut. AccMut
automatically detects the equivalence modulo states among a statement and its
mutations, then groups the statements into equivalence classes modulo states,
and uses only one process to represent each class. In this way, we can
significantly reduce the number of split processes. Our experiments show that
our approach can further accelerate mutation analysis on top of split-stream
execution with a speedup of 2.56x on average.Comment: Submitted to conferenc
Empirical Evaluation of Test Coverage for Functional Programs
The correlation between test coverage and test effectiveness is important to justify the use of coverage in practice. Existing results on imperative programs mostly show that test coverage predicates effectiveness. However, since functional programs are usually structurally different from imperative ones, it is unclear whether the same result may be derived and coverage can be used as a prediction of effectiveness on functional programs. In this paper we report the first empirical study on the correlation between test coverage and test effectiveness on functional programs. We consider four types of coverage: as input coverages, statement/branch coverage and expression coverage, and as oracle coverages, count of assertions and checked coverage. We also consider two types of effectiveness: raw effectiveness and normalized effectiveness. Our results are twofold. (1) In general the findings on imperative programs still hold on functional programs, warranting the use of coverage in practice. (2) On specific coverage criteria, the results may be unexpected or different from the imperative ones, calling for further studies on functional programs
Reliability Assurance for Deep Neural Network Architectures Against Numerical Defects
With the widespread deployment of deep neural networks (DNNs), ensuring the
reliability of DNN-based systems is of great importance. Serious reliability
issues such as system failures can be caused by numerical defects, one of the
most frequent defects in DNNs. To assure high reliability against numerical
defects, in this paper, we propose the RANUM approach including novel
techniques for three reliability assurance tasks: detection of potential
numerical defects, confirmation of potential-defect feasibility, and suggestion
of defect fixes. To the best of our knowledge, RANUM is the first approach that
confirms potential-defect feasibility with failure-exhibiting tests and
suggests fixes automatically. Extensive experiments on the benchmarks of 63
real-world DNN architectures show that RANUM outperforms state-of-the-art
approaches across the three reliability assurance tasks. In addition, when the
RANUM-generated fixes are compared with developers' fixes on open-source
projects, in 37 out of 40 cases, RANUM-generated fixes are equivalent to or
even better than human fixes.Comment: To appear at 45th International Conference on Software Engineering
(ICSE 2023), camera-ready versio
- …